Non-parametric Mixture Models for Clustering
نویسندگان
چکیده
Mixture models have been widely used for data clustering. However, commonly used mixture models are generally of a parametric form (e.g., mixture of Gaussian distributions or GMM), which significantly limits their capacity in fitting diverse multidimensional data distributions encountered in practice. We propose a non-parametric mixture model (NMM) for data clustering in order to detect clusters generated from arbitrary unknown distributions, using non-parametric kernel density estimates. The proposed model is non-parametric since the generative distribution of each data point depends only on the rest of the data points and the chosen kernel. A leave-one-out likelihood maximization is performed to estimate the parameters of the model. The NMM approach, when applied to cluster high dimensional text datasets significantly outperforms the state-of-the-art and classical approaches such as K-means, Gaussian Mixture Models, spectral clustering and linkage methods.
منابع مشابه
Spectral Clustering and Embedding with Hidden Markov Models
Clustering has recently enjoyed progress via spectral methods which group data using only pairwise affinities and avoid parametric assumptions. While spectral clustering of vector inputs is straightforward, extensions to structured data or time-series data remain less explored. This paper proposes a clustering method for time-series data that couples non-parametric spectral clustering with para...
متن کاملAdvanced mixtures for complex high dimensional data: from model-based to Bayesian non-parametric inference
Cluster analysis of complex data is an essential task in statistics and machine learning. One of the most popular approaches in cluster analysis is the one based on mixture models. It includes mixture-model based clustering to partition individuals or possibly variables into groups, block mixture-model based clustering to simultaneously associate individuals and variables to clusters, that is c...
متن کاملBayesian non-parametric parsimonious clustering
This paper proposes a new Bayesian non-parametric approach for clustering. It relies on an infinite Gaussian mixture model with a Chinese Restaurant Process (CRP) prior, and an eigenvalue decomposition of the covariance matrix of each cluster. The CRP prior allows to control the model complexity in a principled way and to automatically learn the number of clusters. The covariance matrix decompo...
متن کاملBayesian non parametric inference of discrete valued networks
We present a non parametric bayesian inference strategy to automatically infer the number of classes during the clustering process of a discrete valued random network. Our methodology is related to the Dirichlet process mixture models and inference is performed using a Blocked Gibbs sampling procedure. Using simulated data, we show that our approach improves over competitive variational inferen...
متن کاملA robust wavelet based profile monitoring and change point detection using S-estimator and clustering
Some quality characteristics are well defined when treated as response variables and are related to some independent variables. This relationship is called a profile. Parametric models, such as linear models, may be used to model profiles. However, in practical applications due to the complexity of many processes it is not usually possible to model a process using parametric models.In these cas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010